home *** CD-ROM | disk | FTP | other *** search
-
- Assembly Language for Veggies (And C programmers) Part 1.
-
- So you wanna be an Assembly Language programmer? OK, no problem! this DOC is
- designed to introduce you to the basics of ASM and the concepts behind same. I
- will be providing examples and some demo routines along the way, along with
- cross refences and examples from other languages to clarify certain points.
-
- OK, so here goes...
-
- When you program in assembly language, you have complete and utter control of
- the computer, and everything it does. YOU get to choose EXACTLY it's behavior
- under your program. you can directly access any hardware and do anything - the
- only limit is your skill.
-
- WHAT YOU GET
-
- Basicly, assembly programs talk directly to the 8088, 8086, 80188, 80186,
- 80286, 80386, or 80486 IC inside your Machine. This is a custom chip designed
- by Intel and is called the CPU (Central Processing Unit). We begin by looking
- into these chips. your machine, depending upon model, will use one of these
- chips. XT's have either 8088's, 8086's or their NEC clones, the V20's and
- V30's. (The NEC Chips are 100% compatible), whilst the AT's use the 8018x
- series (Ratrely, but they are used!) or 80286 chip. The newer fast machines
- use 80386 or 80486 chips and hense their name.
- All the chips are "Upward Compatible" - that means that anything the 8088
- could do, all the chips can do too, except faster. The 186 and 286 added more
- instructions - the 386 & 486 can do those as well.. so you see that the 486
- is king of the mountain, but will do the exact same job of an 8088 (only about
- 30 times faster!) if required.
-
- Because of this Upward compatibility, you see that we can write a program that
- works on an 8088 and expect it to execute correctly on any IBM design,
- regardless of CPU UNLESS we use instructions specificly for one of the later
- chips (Which is nearly never).
-
- So, to program these chips, one requires an understanding of them.... Here
- goes. The chip has the ability to execute machine code instructions. This is
- the most important job of the chip. It reads an INSTRUCTION from computer
- memory, figures out what the instruction means, and executes it, then gets the
- next instruction. That is ALL that a CPU is capable of doing!!!! As long as a
- computer is operating, it is doing this...from the first second you switch it
- on, until you switch it off again....
-
- Even when a machine has "Crashed" it can still be doing something - and
- usually is - but what it is doing is useless and won't allow the operator a
- chance to send it instructions to tell it to stop it's useless activity. The
- only way to stop a CPU from doing it's job is to HOLD the RESET button on the
- computer down, or to switch power off.
-
- Thus you see, you must have a logical set of instructions with a correct start
- point, and a correct end point. The CPU keeps track of what it is doing with a
- set of REGISTERS. the registers are of utmost importance to the programmer,
- for without them he would be lost.
-
- Here are the registers of the 8088 series (common to all models):
-
- AX, BX, CX, DX SP, BP, SI, DI CS, DS, ES, SS IP, F.
-
- The letters are the standard referance as used by common agreement. all
- registers are 16 bits wide - that is they can hold a number from 0 to FFFF
- hex. They are grouped according to use :
-
- IP - Instruction Pointer, is used internally by the CPU to keep track of what
- instruction it should execute NEXT....IE a marker of where in memory it is up
- to.
-
- F - Flags, also internal to the CPU, is a set of 1 bit markers that can be
- either 0 or 1 to indicate a certain CPU status. The Flags have a set of
- instructions designed to read individual status Bits built into the CPU.
-
- CS - code segment, the memory segment of the executing program. (more on
- segments to come in a tic..) - this will be set upon startup of your program
- and is usually NEVER touched.
-
- DS - data segment, the default segment for which to get data from - used by
- some instructions for transferring data about in memory.
-
- ES - Same as DS, but toally user definable.
-
- SS - Stack segment - Like DS, but only for stack operations. not normally
- touched by user.. see section on stack.
-
- SP - Stack pointer - a bit akin to IP, but for stack operations.
-
- BP - Base pointer - general 16 bit register for user useage.
-
- SI - source index - used by some instructions for data transfer. for user
- useage.
-
- DI - Destination Index - same as SI.
-
- AX - Accumulator. 16 bit general register for user useage. all math conducted
- inside this register.
-
- BX - Base - general register for user useage - also used in some operations.
-
- CX - count - general register for user useage - also used in some block
- movement operations as a loop counter.
-
- DX - Data - general register for user useage - also used in memory referance
- and 32 bit math operations.
-
- To keep things flexible, AX, BX, CX and DX can be divided into 2 8 bit
- registers... Note: These are not extra, separate registers, simply a way of
- accessing the same register 8 bits at a time!! The 8 bit versions are called
- AH and AL , BH and BL etc... not too obviously, AH is the top 8 bits of AX,
- whilst AL is the bottom 8 bits...
-
- Thus a program that stores 67ac into AX could just as easily store 67 intoAH
- and ac into AL - it would result in the same thing - AX would now equal 67ac.
-
- One important concept to be grasped is that the registers are just like
- pidgeon holes.... they just hold a number. That number can be an address, the
- ASCII code for a letter, the result of a math instruction or whatever. The CPU
- only knows it's got a number... thus, there's no such thing as:
-
- Var
-
- cx : word;
- al : char;
-
- or similar... It overcomes a big hassle in many languages... in PASCAL one
- can't take a number variable and drop it into the middle of a string, one must
- use the STR( function... not in ASM... one just umm.... uses it! thus there
- are no "conversion" functions built in, or needed.... makes things a LOT
- simpler at times!
-
- As you ave gathered, the 8088 series are 16 bit CPU's - called this because
- all the registers are 16 bit, and the data paths inside the cips are 16 bit
- also! (Funny 'bout that)... BUT they were designed to use up to 1 MB of
- memory. (Take my word for it) .... The problem is that 1 Meg requires 20 bits
- to count up all the combinations... how does one count to 20 bits with 16 bit
- registers? Impossible! - YES!! .... so the designers thought that instead
- of inventing a 20 bit CPU they'd design SEGMENTATION. This is one thing new
- programmers come to hate! It's easy if you follow it carefully, but more often
- than not people stuff it up. This is where the segment registers come into
- play.
-
- Memory is accessed using a combination of 2 16 bit registers... the segment
- and the offset... Valid combinations include : CS:IP (for where to get the
- next instruction from) SS:SP (stack location) DS:SI, ES:DI and more... Note
- that a SEGEMENT register must come first (CS, DS, ES, SS) - you can't do AX:DI
- - it just isn't allowed. This is a hardware restriction, but in practice it's
- not a hassle.
-
- Here's the math for working out which address you're at...
-
- The segment registers point to the start of a 64k "Chunk" of RAM, whilst the
- offset points to the byte within that chunk.
-
- (All addresses in HEX notation)
-
- you can have many combinations that relate to the same physical address...
- Thus: 0000:0401 is the same as 0040:0001, f000:a000 is the same as fa00:0000
-
- Addition is performed inside the CPU to work things out thus:
-
- Segment register: 0000 0040
- plus offset register: 0401 0001
- -------------------------------------------------
- equals: 00401 00401
- --------------------------------------------------
-
- note how the result is 5 hex digits long - that's 20 bits in binary. The
- segment is moved one digit along as it's a 64k chunk it points to. (64k = 4
- bits = 1 hex digit)
-
- By the way, get used to hex, it's the generic way of referring to register
- contents.. It's always a 4 DIGIT number for a 16 bit register, a 2 DIGIT
- numver for 8 bit, or a 5 digit number for 20 bit. the conversion is thus:
-
- | | | | |
- Binary: 1 0 1 0 0 0 0 1 1 1 0 0 1 0 0 1
-
- Take the nuber in groups of 4 bits. A hex digit is base 16 - there are 16
- possibilities per digit (Decimal offers 10 [0-9]) hex has 0-9 and a-f [16
- varietites]
-
- you get 16 combinations in 4 bits - from 0 0 0 0 to 1 1 1 1 [0-f]
-
- so the number above is: a1c9
-
- Remember that each bit has a "weight" thus:
-
- 8 4 2 1 - weight
-
- 0 1 1 0 - hex number
-
- to convert quickly, take a group of 4 bits, mentally ad the weights of all "1"
- bits - in this example 4+2 and the result is 6. The hex for this binary is 6.
- note that in hex addition, 9+1=a, not 10!!! SO:
-
- 1 1 1 1 = 8+4 (c) + 2 (e) +1 (f) = F hex.
-
-
- That is all the CPU provides for you to use!!! (And all you need)... Here's
- how...
-
-
- THE BASIC IBM PC
-
- We begin out examples by looking at a basic IBM PC equiped with <say> a
- floppy, a hard drive, some RAM and a video card, running MS-DOS.
-
- OK, when your program is started, it is given access to all available memory
- from wherever dos has currently used up to the end of physical memory. This
- could be as much as 600k or maybe even more under DOS 5.0, or as little as 30
- or 60k in a very small multitasking window. Your program has permission to do
- anything to this block of memory, and it's contents at load time are garbage.
-
- Program begins execution at the first instruction in your program (CS:IP will
- initially point here ) and wanders through, following the program to the end.
-
- In the IBM PC the CS, IP, DS, ES and SS:SP are all preset for you to valid,
- correct settings when your program is loaded. further, CS, DS, ES (and usually
- SS) will all be equal.
-
- Because of the 64k segmentation limitation, everyone seems to do things in 64k
- chunks, and DOS is no exception. your program always begins at CS:0100 (The
- first 256 hex bytes are filled with information to be used by the program if
- needed) and the SS:SP is usuallyplaced at the very end of the segment (ie
- SS=CS, IP=FFFE)
-
- ABOUT THE STACK
-
- The stack is vital to the operation of any program. It is for holding
- temporary addresses during program execution, and can be used by the user or
- the CPU at any time. Thus, a valid stack must always be maintained. Whenever
- an instruction executes the equivilant of a BASIC GOSUB, the address of where
- to go upon RETURN is saved on the stack. This must be a 16 bit digit (CS:IP)
- thus the stack starts at FFFE and not FFFF. after a storage, the SP is
- decreased by 2, so it then points to FFFC. don't ask why it grows downward, it
- just does.... the lower the SP, the bigger the stack. Again, when the RETURN
- is executed, the SP has 2 added to it, and again becomes FFFE.
- more on the stack later.
-
- now it's time to see what is available to our program when it's run.
-
- IBM thought they'd give us a set of interface routines for using the hardware
- they'd built in. Nice of them that, saves us from directly manipulating the
- hardware which is usually a tricky and wierd task! These are called the BIOS
- routines and are built into a chip on the computer's hardware. They are
- responsible for starting the computer when powered u, and also loading the
- operating system MS-DOS.
-
- DOS also supplies a set of routines for working with DOS - these are called the
- DOS routines (No shit!) and are available whenever DOS is in memory.
-
- There's stuff like reading and writing to disk, screen, etc. getting emory
- sizes etc and all sorts. See a good ASM book for details - there's hundreds of
- them and take about 200 pages of text to fully cover - I'm not typing that lot
- out again!!!
-
- In fact, most of your programs will simply be loading up and calling these
- routines... Here's a simple example (Type this into A86, it'll work !!)
-
-
- ; Demo program1
-
- begin: jmp start
-
- string db 'Hi there!!$'
-
- start: mov dx,offset string
- mov ah,09
- int 021
- int 020
-
-
- now that will be very confusing to you, but it's a simple program in assembler
- (Can you guess what it does?) Let's look at it line by line.
-
- ; Demo program1 --- any text after a ; is ignored - you're actually telling
- the assembler here, not writing 8088 code. This could be left out without any
- problem. It does not effect the size of the final code.
-
- begin: jmp start Here's our first instruction. Begin and start are LABELS
- used by the assembler to refer to an address... note how there's no hardware
- addresses written in... I could have said simply JMP CS:010F but it's much
- easier to use a label. That way if I added more between Begin: and start: i
- would not have to recalculate the address. The assembler works out the address
- at assembly time and substitutes it instead.
-
- string db '.....$' String is another label. db means define byte. this is
- how we reserve memory. everything between the quotes is stored into the
- program and appears in memory at load time referenced by the string label.
-
- start: mov dx,offset string this loads the DX register with the ADDRESS of the
- string label. Note the word offset. This means you want the address of the
- label, not what is at that address.
-
- mov ah,09 - loads the AH register with 09 hex. This is needed by the next
- instruction.
-
- INT 021 - Call MS-DOS's built in routines They see that AH=09, decide that
- you want a write string to screen routine and display the text starting at the
- location in DX (Well, really DS:DX, but as I said, DS was setup for us before
- the program began) until it sees a $ symbol. The routine is written to return
- to our program when the $ symbol is encountered. the $ is not written to
- the screen.
-
- INT 020 - call another DOS routine. This one returns control to the calling
- program (In most cases Command.com)
-
-
- To fully understand all about what the hell is going on with all these INT's,
- I strongly suggest you invest in one of these books:
-
- The Peter Norton Programmer's Guide to the IBM PC - Peter Norton. Try to get
- edition #2 but if you can only get a first ed copy or one'ds going cheap grab
- it - they're pretty good (I still use an ed.1 copy!)
-
- Advanced MS-DOS - Ray Duncan. Only buy 2nd ed. 1st ed. was fairly limited and
- not really worth the money - it lacks any coverage above dos 3.0....
-
- Nothing else is worth your money. I'll make the occasinal page cross referance
- (esp. to the Norton book which I feel is the better of the two) from time to
- time..
-
- ALSO
-
- Scab from your favourite leeching BBS a copy of A86 V3.21 or later, and D86 to
- go with it... this is the assembler I'll be using in the future... I'll
- consider demonstrating MASM if you really want me too, but I don't know a hell
- of a lot about it and don't really want to learn... I only know enuf to know
- what a basic program might need.
-
- This brings lesson 1 pretty much to a close... get yourself one of these
- books, delve into it, get A86, type in the demo, absorb as much as you can
- then write me back with your questions and problems!
-
- I'll be starting lesson 2 soon!.... Cya there.
-
- .\\erlin
-